Bagged Boosted Trees for Classification of Ecological Momentary Assessment Data
نویسندگان
چکیده
Ecological Momentary Assessment (EMA) data is organized in multiple levels (per-subject, per-day, etc.) and this particular structure should be taken into account in machine learning algorithms used in EMA like decision trees and its variants. We propose a new algorithm called BBT (standing for Bagged Boosted Trees) that is enhanced by a over/under sampling method and can provide better estimates for the conditional class probability function. Experimental results on a real-world dataset show that BBT can benefit EMA data classification and performance. 1 Background & Motivation This work focuses on classification trees and how their ensembles can be utilized in order to set up a prediction environment using Ecological Momentary Assessment (EMA) data from a real-world study. EMA [8] refers to a collection of methods used in many different disciplines by which a research subject repeatedly reports on specific variables measured close in time to experience and in the subject’s natural environment (e.g. experiencing food craving is measured again and again on the same subject). EMA aims to minimize recall bias, maximize ecological validity and allow microscopic analysis of influence behavior in real-world contexts. EMA data has a different structure than normal data and account for several dependencies between them, since e.g. many samples belong to the same subject so they are expected to be correlated. However, most decision trees that deal with EMA data do not take these specificities into account. Bagging involves having each tree in the ensemble vote with equal weight while boosting involves incrementally building an ensemble by training each new model instance to emphasize the training instances that previous models mis-classified. Major differences between bagging and boosting are that (a) boosting changes the distribution of training data based on the performance of classifiers created up to that point (bagging acts stochastically) and (b) bagging uses equal weight voting while boosting uses a function of the performance of a classifier as a weight for voting. There are limited studies on combining bagging and boosting ([10], [3], [6] and [11]), however, none of these approaches have been applied to longitudinal data or take into account the EMA structure. Efforts to apply decision trees to EMA data have been attempted but they are mostly focusing on regression tasks ([7], [4], [2]) and on the other hand they do not use bagging or boosting for improving performance. Work in current paper aims at bridging this gap by combining 1 Department of Data Science and Knowledge Engineering, Maastricht University, email:{jerry.spanakis,gerhard.weiss}@maastrichtuniversity.nl 2 Faculty of Psychology and Neuroscience, Maastricht University, email: [email protected] bagging and boosting with the longitudinal data structure. 2 BBT: The proposed algorithm Let the training data be x1, ..., xn and y1, ..., yn where each xi is a ddimensional vector and yi ∈ {−1, 1} is the associated observed class label. To justify generalization, it is usually assumed that training data as well as any test data are iid samples from some population of (x, y) pairs. Our goal is to as accurately predict yi given xi. The first step to fit a BBT is to select the loss function, which in the case of a classification problem is based on the logistic regression loss. After some initial parameter selection (number of trees to be grown in sequence, shrinkage (or learning) rate, size of individual trees and fraction of the training data sampled) we grow BBT (say using M trees) on the training data using the following process and by growing single Boosted Trees (BT): • Divide the data into B (typically 5− 10) subsets and construct B training data sets each of which omits one of the B subsets (the ‘out-of-bag’ data). Each one of the B subsets is created by bootstrap sampling data points from the set of subjects (p = 1, ..., P ). To create the learning set we introduce the strategy S according to which one observation is drawn per subject. This strategy is based on a simple rationale: When only one observation per subject is selected, the probability that different observations are used for the training of different trees is increased, although the same subjects might be selected which further reduces similarity between trees. By this way, we manage to incorporate advantages of subject-based bootstrapping and observation-based bootstrapping into the final BBT ensemble. Also, this approach can be applied to unbalanced data points per subject. • GrowB BT; one for each of theB training sets, based on the AdaBoost algorithm [1]: First let F0(xi) = 0 for all xi and initialize weights wi = 1/d for i = 1, ..., d. Then repeat the following for m = 1, ...,M for each one of the B BT: ? Fit the decision tree gm to the training data sample using weights wi where gm maps each xi to -1 or 1. ? Compute: the weighted error rate m = ∑n i=1 wiI{yi 6= gm(xi)} half its log-odds and derive αm = 1 2 log 1− m m ? Let Fm = Fm−1 + αmgm ? Replace the weights wi with wi = wiemmii and then renormalize by replacing each wi by wi/( ∑ wi). • Calculate the PE for each BT for tree sizes 1 to M from the corresponding out-of-bag data and pool across the B boosted trees. Predictions for new data are computed by first predicting each of ar X iv :1 60 7. 01 58 2v 1 [ cs .L G ] 6 J ul 2 01 6 the component trees and then aggregate the predictions (e.g., by averaging), like in bagging. • The minimum PE estimates the optimum number of trees m∗ for the BT. The estimated PE of the single BT obtained by crossvalidation can thus also be used to estimate PE for the BBT. BBT thus require minimal additional computation beyond estimation of m∗. • Reduce the number of trees for each BT to m∗. For a classification problem, we use an estimate pm(x) of the Conditional Class Probability Function (CCPF) p(x) that can be obtained from Fm through a logistic link function: pm(x) = pm(y = 1|x) = 1 1 + exp(−2Fm(x)) (1) Classifying at the 1/2 quantile of the CCPF works well for binary classification problems but in the case of EMA data, sometimes classification with unequal costs or, equivalently, classification at quantiles other than 1/2 is needed. Strategies about correctly computing the CCPF are considered [5] by over/under-sampling which convert a median classifier into a q-classifier.
منابع مشابه
Boosted trees for ecological modeling and prediction.
Accurate prediction and explanation are fundamental objectives of statistical analysis, yet they seldom coincide. Boosted trees are a statistical learning method that attains both of these objectives for regression and classification analyses. They can deal with many types of response variables (numeric, categorical, and censored), loss functions (Gaussian, binomial, Poisson, and robust), and p...
متن کاملEnsemble machine learning on gene expression data for cancer classification.
Whole genome RNA expression studies permit systematic approaches to understanding the correlation between gene expression profiles to disease states or different developmental stages of a cell. Microarray analysis provides quantitative information about the complete transcription profile of cells that facilitate drug and therapeutics development, disease diagnosis, and understanding in the basi...
متن کاملAn Empirical Comparison of Supervised Learning Algorithms Using Different Performance Metrics
We present results from a large-scale empirical comparison between ten learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. We evaluate the methods on binary classification problems using nine performance criteria: accuracy, squared error, cross-entropy, ROC Area, F-score, p...
متن کاملAn Empirical Evaluation of Supervised Learning for ROC Area
We present an empirical comparison of the AUC performance of seven supervised learning methods: SVMs, neural nets, decision trees, k-nearest neighbor, bagged trees, boosted trees, and boosted stumps. Overall, boosted trees have the best average AUC performance, followed by bagged trees, neural nets and SVMs. We then present an ensemble selection method that yields even better AUC. Ensembles are...
متن کاملComparing ensembles of decision trees and neural networks for one-day-ahead streamflow prediction
Ensemble learning methods have received remarkable attention in the recent years and led to considerable advancement in the performance of the regression and classification problems. Bagging and boosting are among the most popular ensemble learning techniques proposed to reduce the prediction error of learning machines. In this study, bagging and gradient boosting algorithms are incorporated in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016